在我们的论文中研究了一个称为卫星下行链路调度问题(SDSP-BRM)下的称为卫星下行链路调度问题(SDSP)。与必须一次完全下载成像数据的传统SDSP相比,SDSP-BRM允许将成像数据的数据分解为可以在不同的播放窗口中下载的许多部分。通过分析SDSP-BRM的特性,我们首先提出了一个混合整数编程模型以制定其制定模型,然后证明SDSP-BRM的NP硬度。为了解决该问题,我们设计了一种简单有效的启发式算法(SEHA),其中提出了许多问题的移动操作员用于本地搜索。一组精心设计的场景的数值结果证明了与通用CPLEX求解器相比,所提出的算法的效率。我们进行了其他实验,以阐明分段策略对拟议SEHA的整体性能的影响。
translated by 谷歌翻译
自我监督最近在其新的图形学习前沿飙升。它有助于对下游任务有利的图表表示;但其成功可以遵守手工造工或经常昂贵的试验和错误的域名知识。即使是其最先进的代表性,图形对比学习(GraphCl),也不完全没有这些需求,因为GraphCL使用由Ad-hoc手册选择图数据增强的预制物反映。我们的工作旨在通过回答以下问题来推进GraphCl:如何代表图形增强视图的空间?在该空间之前可以依赖哪些原则?可以建立哪些框架,以便在对比学习中学习之前的串联?因此,我们在增强集中的预制离散延伸到图形生成器的参数空间之前的学习连续,假设图形Priors本身类似于图像歧管的概念,可以通过数据生成来学习。此外,为了形成由于先前的可读性而没有折叠的琐碎解决方案的对比视图,我们利用了信息最小化(Infomin)和信息瓶颈(InfoBN)的两个原则来规范学习的前提。最终,对比学习,Infomin和InfoBn有机融合到双级优化的一个框架中。我们的原则和自动化方法已被证明对艺术最先进的图形自我监督方法(包括Graphcl)的竞争力,包括小图形的基准;并且在大型图表上显示了更好的普遍性,而不诉诸人类专业知识或下游验证。我们的代码在https://github.com/shen-lab/graphcl_automated公开发布。
translated by 谷歌翻译
Generalizable, transferrable, and robust representation learning on graph-structured data remains a challenge for current graph neural networks (GNNs). Unlike what has been developed for convolutional neural networks (CNNs) for image data, self-supervised learning and pre-training are less explored for GNNs. In this paper, we propose a graph contrastive learning (GraphCL) framework for learning unsupervised representations of graph data. We first design four types of graph augmentations to incorporate various priors. We then systematically study the impact of various combinations of graph augmentations on multiple datasets, in four different settings: semi-supervised, unsupervised, and transfer learning as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, our GraphCL framework can produce graph representations of similar or better generalizability, transferrability, and robustness compared to state-of-the-art methods. We also investigate the impact of parameterized graph augmentation extents and patterns, and observe further performance gains in preliminary experiments. Our codes are available at: https://github.com/Shen-Lab/GraphCL.
translated by 谷歌翻译
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world selfdriving problems, we introduce a new large-scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest cam-era+LiDAR dataset available based on our proposed geographical coverage metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.
translated by 谷歌翻译
Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, our embedding is only used as an internal guidance of a convolutional network. Our novel dynamic segmentation head allows us to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. We achieve a new state of the art in video object segmentation without fine-tuning with a J &F measure of 71.5% on the DAVIS 2017 validation set. We make our code and models available at https://github.com/tensorflow/ models/tree/master/research/feelvos.
translated by 谷歌翻译
In this work, we present a new computer vision task named video object of interest segmentation (VOIS). Given a video and a target image of interest, our objective is to simultaneously segment and track all objects in the video that are relevant to the target image. This problem combines the traditional video object segmentation task with an additional image indicating the content that users are concerned with. Since no existing dataset is perfectly suitable for this new task, we specifically construct a large-scale dataset called LiveVideos, which contains 2418 pairs of target images and live videos with instance-level annotations. In addition, we propose a transformer-based method for this task. We revisit Swin Transformer and design a dual-path structure to fuse video and image features. Then, a transformer decoder is employed to generate object proposals for segmentation and tracking from the fused features. Extensive experiments on LiveVideos dataset show the superiority of our proposed method.
translated by 谷歌翻译
In this paper, the CONFIG algorithm, a simple and provably efficient constrained global optimization algorithm, is applied to optimize the closed-loop control performance of an unknown system with unmodeled constraints. Existing Gaussian process based closed-loop optimization methods, either can only guarantee local convergence (e.g., SafeOPT), or have no known optimality guarantee (e.g., constrained expected improvement) at all, whereas the recently introduced CONFIG algorithm has been proven to enjoy a theoretical global optimality guarantee. In this study, we demonstrate the effectiveness of CONFIG algorithm in the applications. The algorithm is first applied to an artificial numerical benchmark problem to corroborate its effectiveness. It is then applied to a classical constrained steady-state optimization problem of a continuous stirred-tank reactor. Simulation results show that our CONFIG algorithm can achieve performance competitive with the popular CEI (Constrained Expected Improvement) algorithm, which has no known optimality guarantee. As such, the CONFIG algorithm offers a new tool, with both a provable global optimality guarantee and competitive empirical performance, to optimize the closed-loop control performance for a system with soft unmodeled constraints. Last, but not least, the open-source code is available as a python package to facilitate future applications.
translated by 谷歌翻译
运动转移旨在将驱动视频的运动转移到源图像。当驾驶视频中的对象与源图像中的对象之间存在很大差异时,传统的单个域运动转移方法通常会产生显着的伪影。例如,合成的图像可能无法保留源图像的人类形状(参见图1(a))。为了解决这个问题,在这项工作中,我们提出了一种运动和外观适应(MAA)进行跨域运动转移的方法,在该方法中,我们将合成图像中的对象正规化,以捕获驾驶框架中对象的运动,而仍保留对象在源图像中的形状和外观。一方面,考虑合成图像和驾驶框架的对象形状可能有所不同,我们设计了一个形状不变的运动适应模块,该模块可以在两个图像中强制对象零件的角度的一致性来捕获运动信息。另一方面,我们引入了一个结构引导的外观一致性模块,旨在使合成图像的相应贴片和源图像之间的相似性正式化,而不会影响合成图像中学习的运动。我们提出的MAA模型可以通过循环重建损失以端到端的方式进行训练,并最终产生令人满意的运动转移结果(参见图1(b))。我们在人类舞蹈数据集Mixamo-Video上进行了广泛的实验,以便于时尚视频和人脸数据集vox-celeb到cufs;在这两个方面,我们的MAA模型在定量和定性上都优于现有方法。
translated by 谷歌翻译
图像动画旨在使用从驾驶视频中学到的运动来对源图像进行动画映像。当前的最新方法通常使用卷积神经网络(CNN)来预测运动信息,例如运动关键点和相应的局部变换。但是,这些基于CNN的方法并未明确对运动之间的相互作用进行建模。结果,可能会忽略重要的基础运动关系,这可能会导致生成的动画视频中产生明显的伪影。为此,我们提出了一种新方法,即运动变压器,这是基于视觉变压器构建运动估计器的首次尝试。更具体地说,我们在提出的方法中介绍了两种类型的令牌:i)由补丁特征和相应位置编码形成的图像令牌; ii)用运动信息编码的运动令牌。两种类型的令牌都被发送到视觉变压器中,以通过多头自我注意力块促进它们之间的基本相互作用。通过采用此过程,可以更好地学习运动信息以提高模型性能。然后,最终嵌入式运动令牌用于预测相应的运动关键点和局部变换。基准数据集上的广泛实验表明,我们提出的方法为最先进的基准取得了令人鼓舞的结果。我们的源代码将公开可用。
translated by 谷歌翻译
有效的全球优化是一种广泛使用的方法,用于优化昂贵的黑盒功能,例如调谐参数,设计新材料等。尽管它很受欢迎,但鉴于其广泛使用,较少的关注来分析问题的固有硬度,重要的是要了解有效的全球优化算法的基本限制。在本文中,我们研究了有效的全球优化问题的最严重的复杂性,并且与现有的内核特异性结果相反,我们得出了一个统一的下限,以根据球的度量熵的指标,以实现有效的全局优化的复杂性在相应的繁殖内核希尔伯特空间〜(RKHS)中。具体而言,我们表明,如果存在确定性算法,该算法在$ t $函数评估中实现了任何函数$ f \ in s $ in s $ f \ in $ t $函数评估的次优差距,则有必要至少是$ \ omemega \ left(\ frac {\ log \ mathcal {n}(s(s(\ Mathcal {x})),4 \ epsilon,\ | \ | \ cdot \ cdot \ | _ \ iftty)} {\ log(\ frac {\ frac {r} {r} {\ epsilon {\ epsilon })}} \ right)$,其中$ \ mathcal {n}(\ cdot,\ cdot,\ cdot)$是覆盖号码,$ s $是$ 0 $ $ 0 $,RKHS中的RADIUS $ r $,并且$ s(\ mathcal {x})$是可行套装$ \ mathcal {x} $的$ s $的限制。此外,我们表明,这种下限几乎与常用平方指数核的非自适应搜索算法和具有较大平滑度参数$ \ nu $的垫子\'ern内核所获得的上限匹配,最多可替换为$ $ $ d/2 $ by $ d $和对数项$ \ log \ frac {r} {\ epsilon} $。也就是说,我们的下限对于这些内核几乎是最佳的。
translated by 谷歌翻译